2025.10.13 | 桌面交互预训练解锁机器人潜能;统一模型赋予相机空间想象力
Description
本期的 14 篇论文如下:
[00:20 ] 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能)
[01:13 ] 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型)
[01:56 ] 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling(TAG:抑制幻觉的扩散采样切向放大引导)
[02:31 ] 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs(多模态提示优化:为何不为多模态大模型释放全模态潜能)
[03:05 ] 🚀 AutoPR: Let's Automate Your Academic Promotion!(AutoPR:让学术晋升一键自动化!)
[03:39 ] 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?(R-HORIZON:你的大推理模型在广度与深度上究竟能走多远?)
[04:14 ] 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels(Webscale-RL:把强化学习数据扩展到预训练体量的自动化流水线)
[04:56 ] 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km(SpaceVista:毫米到千米全尺度视觉空间推理)
[05:37 ] 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams(StreamingVLM:面向无限视频流的实时理解框架)
[06:19 ] 🌐 KORMo: Korean Open Reasoning Model for Everyone(KORMo:人人可用的韩语开放推理模型)
[06:42 ] ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting(别浪费错误:通过置信度加权利用负RL组)
[07:25 ] 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization(从推理到学习的桥梁:以复杂度分布外泛化揭穿幻觉)
[08:16 ] ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation(DISCO:以模型分歧为导向的样本浓缩加速评测)
[08:56 ] 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction(面向开放词汇占用预测的各向异性采样渐进高斯Transformer)
<figure>
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递